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REGEIVriD 

IMPLICIT  COALITIONS  IN  A  GENERALIZED  PRISONER'S  DILEMMA 

ABSTRACT 

The  presence  of  a  third  party  can  afTect  attempts  by  two  players  to  cooperate  in  a  three- 
player,  continuous-alternative,  repeated  prisoner's  dilemma-like  game.  If  the  third  player  is 
uncooperative,  two  players  may  find  it  advantageous  to  cooperate  implicitly,  at  a  level 
somewhere  between  full  (i  e  ,  three-way)  cooperation  and  full  defection  We  examine  this 
phenomenon  of  implicit  coalitions  via  two  sequential  computer  tournaments  (38  algorithms 
in  tourney  1,  44  algorithms  in  tourney  2).  In  both  tournaments,  each  with  a  different  payofT 
function,  the  ability  to  recognize  and/or  encourage  implicit  coalitions  seems  to  be  a  key 
indicator  of  success  This  result  holds  up  in  a  test  of  robustness.  We  also  examine  other 
properties,  including  those  identified  earlier  by  Axelrod  (1980a, b)  Detailed  tournament 
results  are  given. 


MORE  THAN  TWO  PLAYERS 

At  a  time  when  the  superpowers  appear  to  be  moving  toward  some  degree  of  cooperation  on 
nuclear  weapons,  there  is  a  growing  concern  about  the  nuclear  capability  among  a  number  of  nations 
in  the  Middle  East  (Barnaby  1987).  One  concern  is  whether  the  presence  of  a  non-cooperating  outside 
player  will  encourage  or  discourage  cooperation  among  the  superpowers.  It  is  not  yet  clear  how  these 
outside  players  will  affect  the  level  of  cooperation  that  the  superpowers  might  achieve. 

Consider  the  dramatic  effect  of  outside  players  on  OPEC.  After  a  decade  of  highly  profitable 
cooperation  (collusion)  the  cartel  collapsed,  partially  because  of  increased  production  by  non-OPEC 
nations  such  as  the  United  Kingdom.  Member  nations  began  to  cheat  more  and  more  from  the  agreed 
price  and  production  guidelines,  reaching  a  climax  last  year  when  Saudi  Arabia  was  no  longer  willing 
to  be  the  sole  cooperator.  Early  in  1986,  the  Saudis  finally  gave  up  their  attempts  to  maintain 
cooperation  and  began  increasing  output  and  offering  price  discounts  in  an  attempt  to  "punish"  the 
United  Kingdom.  Oil  prices  dropped  as  low  as  $8/barrel.  Recently,  the  Saudis  and  OPEC  have 
attempted  to  stabilize  prices  at  a  more  moderate  level  than  before,  finally  acknowledging  the  critical 
role  of  outside  nations 

In  another  example,  firms  in  the  U.S.  microelectronics  industries,  adversely  aiTected  by  the 
growing  influence  and  economic  power  of  foreign  competition,  have  formed  the  Microelectronics  and 
Computer  Technology  Corporation  to  cooperate  on  basic  and  applied  research  (Griffin  1987). 
Although  members  risk  loss  of  competitive  research  advantage  relative  to  other  U.S.  firms,  the 
potential  gains  may  well  justify  the  almost  $50  million  investment. 

Whether  or  not  it  is  in  the  social  good  to  encourage  such  coalitions,  it  is  important  to  understand 
how  coalitions  can  influence  the  development  of  effective  strategies  in  games  involving  more  than 
two  players. 

This  paper  addresses  the  role  of  implicit  coalitions  in  a  repeated,  generalized  prisoner's  dilemma 
(GPD).  The  classical  prisoner's  dilemma,  a  two-player,  two-act  game,  captures  the  essential  confiict 
between  unilateral  incentives  (i.e  ,  more  sales  through  price  discounts)  and  group  incentives  (price 
restraint  and  higher  profits).  The  GPD  extends  this  basic  form  of  conflict  to  a  richer,  more 
complicated  setting  -  an  N-player  game  with  many  possible  actions,  either  discrete  (e.g.,  number  of 
warheads)  or  continuous  (eg.,  price  levels  or  R&D  investment).  After  formalizing  the  GPD  and 
motivating  implicit  coalitions,  we  describe  two  competitive  strategy  tournaments  in  the  spirit  of 
Axelrod  (1980a,b).  Results  of  the  tournaments  illustrate  the  importanceof  implicit  coalitions  in  a 
repeated  GPD.  We  describe  one  strateg>-  that  seems  to  encourage  coalitions  and  we  test  its  robustness 
across  a  series  of  environments  which  vary  from  very  "nice"  to  very  "nasty." 


FORMALIZATION 

We  begin  with  a  review  of  the  classic  prisoner's  dilemma  (PD)  and  then  motivate  and  formalize 
our  generalizations.  We  illustrate  our  formalization  using  the  GPD  game  used  in  the  first 
competitive  strateg>'  tournament. 
The  Classical  Prisoner's  Dilemma 

Over  its  30-year  lifespan,  the  PD  has  been  one  of  the  most  frequently  studied  phenomena  in 
economics,  political  science,  sociology,  and  psycholog>'.  See  Axelrod  (1984)  for  a  review  of  these  and 
other  applications  of  the  PD 

The  classic  2x2  PD  allows  each  player  to  either  cooperate  (C)  or  defect  (D).  If  both  players 
cooperate  in  a  given  period,  then  each  is  rewarded  with  apayofTof  rpoints    If  one  player  defects  from 
mutual  cooperation,  he  receives  the  temptation  payoff  of /,  while  the  cooperating  player  gets  the 
"sucker's  payoff"  of  s.  If  both  choose  to  defect,  then  each  receives  the  punishment  payoff  of  p. 
Table  1  illustrates  a  typical  set  of  payoffs  for  the  2x2  PD. 

Table  1:  Classic  Prisoner's  Dilemma' 

Player  2 
C2  D2 


Player  1 


r,  =  3 

S;     =    0 

Tj  =  3 

(2    =    5 

^  =  5 

P,     =     1 

s,  =  0 

P2    =     1 

Subscripts  on  actions  (C,  D)  and  payoffs  ir.  r,p,  s)  indicate  the  player. 

A  quick  scan  of  Table  1  reveals  that  each  player  has  the  unilateral  incentive  to  defect,  regardless 
of  the  other  player's  decision,  but  if  the  two  players  cooperate,  both  achieve  higher  scores.  Thus,  if 
this  were  a  one-shot  game,  each  would  be  best  off  defecting  since  there  would  be  no  incentive  to 
deviate  from  that  action.  However,  if  the  game  were  repeated,  strategies  might  change. 

In  general,  the  PD  property  holds  if  the  payoffs  r,  t,  s,  and  p  must  meet  certain  constraints.  The 
essential  property,  once  again,  is  that  each  player  has  a  dominant  alternative  (to  defect),  but  if  both 
defect,  the  resulting  payoff  (p)  is  less  than  the  payoff  for  mutual  cooperation  (r).  Specifically, 

1)  Regardless  of  what  our  opponent  does,  we  are  best  off  defecting   If  he  cooperates  we  prefer  to 
defect  (i.e.,  Or),  and  if  he  defects  we  still  favor  defection  (p>s). 

2)  Regardless  of  what  option  we  choose,  we  are  better  off  if  our  opponent  is  lenient  and  chooses 
his  dominated  alternative  (cooperation)    Thus,  r>s  for  when  we  cooperate,  and  Op  for  when 
we  defect. 


3)    Mutual  cooperation  is  always  preferred  to  mutual  defection:  r>p. 


These  three  sets  of  inequalities  can  be  combined  into  one  compound  inequality:  l>r>p>s.  This  is 
the  heart  of  the  PD.  When  the  game  is  repeated,  some  researchers  add  a  fourth  condition  to 
discourage  oscillations: 

4)    Continued  cooperation  is  better  than  alternating  between  cooperation  and  defection:  2r>t  +  s. 

Generalized  Prisoner's  Dilemma  (GPDi 

In  order  to  study  implicit  coalitions  among  N  players  we  use  the  PD  framework  to  balance 
unilateral  incentives  with  group  cooperation.  We  do  not  claim  that  all  conflict  situations  are  PD's;  we 
claim  only  that  many  interesting  situations  are  consistent  with  the  PD  paradigm   Thus,  we  state  a 
set  of  conditions  which  apply  to  A'^  players  and  which,  in  the  case  of  two  players,  reduce  to  the  classical 
PD  conditions 

Several  researchers  have  proposed  and  analyzed  N-player  PD's,  usually  to  study  the  behavior  of 
large  groups  or  entire  communities.  For  example,  each  person  finds  it  easier  to  litter  than  to  carry 
paper  to  a  wastebasket,  but  society  as  a  whole  is  better  ofTif  no  one  litters.  This  scenario  is  often 
referred  to  as  "The  Tragedy  of  the  Commons,"  first  proposed  by  Hardin  (1968).  See  also  Hamburger 
(1973),  Goehring  and  Kahan  (1976).  Taylor  (1976),  Dawes  (1980),  and  Schelling  (1973).  The  primary 
mode  of  analysis  for  many  players  are  the  payoff  functions,  C(n)  and  Din),  which  describe  the  payofTs 
to  each  cooperator  and  each  defector  when  exactly  n  parties  cooperate. 

While  these  N^-player  models  are  an  excellent  way  to  study  situations  involving  many  players 
facing  a  binary  alternative,  we  are  more  concerned  with  games  involving  fewer  players  and  more 
alternatives.  For  example,  we  wish  to  study  how  two  cooperators  should  respond  to  a  defector  when 
all  three  players  have  a  continuous  range  of  alternatives  available  to  them   We  seek  to  determine, 
among  other  things,  whether  they  should  continue  to  cooperate,  switch  to  defection,  or  take  some 
action  between  cooperation  and  defection   To  address  these  issues  our  generalization  must  deal  with 
continuous  actions 

We  define  the  GPD  in  terms  of  payofTs,  P,  and  actions,  A    In  particular,  let  P,(Aj,  A^, ...  ,  A^)  be 
thepayofi"to  player  f  if  the  ^players  take  actions  Aj  through  A  v.  We  assume  the  payofTs  are 
symmetric' 

First  we  define  two  key  actions,  the  short-term,  non-cooperative,  payofT-maximizing  action.  A"*, 
and  the  joint-payofT-maximizing  action,  A^  A^  and  A"  correspond  to  D  and  C  in  the  classical  PD.  Each 
player  can  maximize  its  payofTs  by  choosing  A^,  regardless  of  the  actions  of  the  other  players.  At  the 
other  extreme,  if  all  players  cooperate  and  choose  the  same  action,  A'  will  maximize  joint  payofTs. 


This  is  not  a  critical  assumption    For  example,  positive  linear  transformations  which  vary  by  player  do  not  affect  our 
analysis.  By  symmetric  we  mean  P,i  ...  .A A  ,  ..  (  =  P.{  ..  .A  , ...  .A^, ...  ). 


For  simplicity  of  exposition  we  first  consider  games  in  which  A!^  is  fixed  and  invariant  with 
respect  to  competitors'  actions.  We  relax  this  assumption  in  the  second  tournament. 

In  some  games,  cooperation  means  less  action  -  fewer  weapons,  less  quantity  produced,  or  less 
aggression.  In  other  games,  cooperation  means  more  action  --  more  missiles  removed  from  the 
European  theater,  higher  prices,  or  more  joint  research.  Without  loss  of  generality,  we  consider  the 
latter  class  of  games  and  assume  A"^  is  less  than  A". 

We  now  define  a  GPD    Because  our  players  are  symmetric,  we  state  the  conditions  for  player  1.  It 
is  understood  that  each  condition  applies  to  all  players. 

1)  As  long  as  A,  >  A",  player  1  increases  its  short-term  payofT by  defecting  further: 

aP 

<  0         for   A  ,  >  A'^. 

An  alternative  interpretation  is  that  unilateral  movement  toward  cooperation  decreases 
payoffs   This  condition  generalizes  t  >  r  and  p  >  s  in  the  classical  PD.  Note  that  by  the 
definition  of  A'  as  the  payofT  maximizing  action,  we  implicitly  assume  that  payoffs  decrease 
when  actions  are  decreased  below  A''. 

2)  Any  movement  toward  unilateral  cooperation  by  an  opponent  increases  the  payoff  to  player  1: 

>0         for  ;■=  2.3,  ..,N. 

dA 

J 

This  condition  generalizes  r>s  and  t>p    Note  that  it  applies  for  all  feasible  actions  by  all  of 
player  I's  competitors  competitors 

3)  Mutual  cooperation  is  profitable:  If  all  players  increase  their  actions  by  the  same  amount,  all 
are  better  ofT(as  long  as  no  actions  exceed  A'): 

+  +   .  . .  +  >  0  A    ^  A^   for  all  / 

dAj       dA^  dAj^  J 

This  condition  generalizes  r>p.  Note  that  we  have  defined  condition  3  for  all  actions,  not  just 
symmetric  actions  and,  by  the  definition  of  A'^  as  the  joint-payofTmaximizing  action,  we  have 
assumed  implicitly  that  condition  3  reverses  for  all  A,  above  A'. 

4)  We  wish  to  rule  out  profitable  oscillations  in  an  analogy  to  2r  ><  -f  s.  There  are  many  possible 
generalizations  to  this  condition,  we  choose  a  simple  one  by  making  it  unattractive  to  take 
turns  reducing  actions  unilaterally.  That  is, 

d 


da 


PjiAj-a.A^,  ..,A^)  +  Pj(Aj,A^-a,  .   ,A,^)+  .  . .  +  P^tA^,  A^, ...  .  A^^-a) 
forA-'g  A  -a,  A   S  A',  j  =  1,2,...,N. 


<  0 


Pj  =  3375A-^-^A^^Af^{Aj-l)  -  480 


An  Example 

Our  first  tournament  was  framed  in  terms  of  a  triopoly  where  scores  correspond  to  profits  and  the 
actions  are  prices.  For  realism  we  chose  the  commonly  used  "constant  elasticity"  model  of  consumer 
response  to  prices  in  a  difTerentiated  triopoly.  The  parameters  of  the  model,  the  elasticities,  were 
chosen  to  be  consistent  with  empirical  estimates  for  a  variety  of  markets  (e.g.,  Telser  1962;  Lambin, 
Naert,  and  Bultez  1975;  Lambin  1976;  Simon  1979).  We  assumed  "constant  returns  to  scale"  and 
chose  scaling  constants  so  that  the  payoffs  were  easy  to  understand. 

Specifically,  the  payofT  function  we  used  was  (in  terms  of  player  1): 

(1) 

A  little  calculus  (taking  the  derivative  of  (1)  and  setting  it  equal  to  zero)  yields  the  non-cooperative 
payoff  maximizing  action,  A"  =  1.40,  which  is  independent  of  competitive  actions    By  assuming 
A,  =  A2  =  A3,  we  can  solve  for  the  cooperative  action,  A'^  =  1.50.  The  reader  can  verify  that  conditions 
1  through  4  hold  for  equation  (1). 

When  we  restrict  actions  to  A"*  and  A",  the  game  defined  by  equation  (1)  becomes  the  classical  PD. 
Suppose,  for  the  sake  of  illustration,  that  players  2  and  3  are  committed  to  choose  the  same  action  as 
each  other.  Table  2  shows  the  possible  payoffs  under  this  restriction. 

Table  2:  Example  payoffs  when  players  2  and3 
choose  identical  actions. 

Players  2  and  3 


A: 


Player  1 


A,  =  A,  =  A^ 

A,  =  A,  =  A' 

=  A 

P,  =  20 
P^  =  P,  =  20 

P,  =  3 

P2  =  P3  =  2\ 

=  A' 

Pj  =  29 
P^  =  P3  =  11 

P,  =  12 

These  payoffs,  although  asymmetric  because  Pi(A'',  A\  AO  =  Pi(A'',  A°,  A'),  clearly  obey  the 
constraints  for  the  classical  PD.  Of  course  this  restriction  on  players  2  and  3  is  not  realistic,  nor  is  it 
imposed  in  the  tournaments.  Table  2  simply  illustrates  the  close  relationship  between  the  classical 
PD  and  the  GPD. 

Before  proceeding  lo  our  analysis  of  implicit  coalitions,  we  note  one  more  important  feature  of  the 
GPD  model,  the  envious  price    Many  researchers  have  noted  that  human  players  in  experimental  PD 
games  often  defect  in  an  attempt  to  beat  their  rivals  rather  than  to  score  well  for  themselves.  In  the 
GPD,  a  distinct  action  ,  A',  is  associated  with  this  type  of  behavior. 

The  envious  action  is  defined  as  the  action  that  maximizes  one  player's  share  of  total  payoffs    It  is 
consistent  with  the  notion  of  difference  maximization  as  discussed  by  Shubik  (1959).  Any  player  who 


misses  the  main  point  of  the  game  (i  e.  maximize  own  score)  and  instead  plays  to  maximize  share  of 
total  payoffs  will  frequently  choose  the  envious  action. 

In  the  game  based  on  equation  (1),  the  envious  action  is  calculated  to  be  A'  =  15/11  =  1.36    In  an 
oligopoly,  managers  might  choose  an  envious  action  if  they  are  rewarded  on  outcomes  relative  to 
other  firms  in  the  industry  (i.e.,  bonuses  based  on  market  share). 

Note  that  in  the  short  run,  players  rarely  have  a  legitimate  incentive  to  choose  A'.  They  can 
usually  do  better  for  themselves  by  raising  actions  from  A'  to  A'^.  However,  A'  might  prove  useful  as  a 
severe  punishment  for  non-cooperative  behavior.' 

IMPLICIT  COALITIONS 

Suppose  player  3  in  a  repeated  three-player  game  has  chosen  a  strateg>'  of  consistently  choosing 
A''.  How  should  players  1  and  2  reacf 

One  option  is  to  punish  the  defector  by  reciprocating  his  totally  non-cooperative  behavior.  For 
example,  Axelrod  (1981)  showed  that  this  type  of  strategy  (ALL-A"^)  is  a  best  response  to  itself  in  the 
two-player  game.  However,  in  the  three-player  game,  maintaining  two-way  cooperation  at  A"  also 
may  not  be  a  very  good  alternative,  as  illustrated  in  Table  2  [Pi(A\  A',  A"*)  =  1 1  <  12  =  P,{A\  A^ 
A').] 

Because  actions  are  not  limited  to  A"^  and  A',  players  1  and  2  have  other  options.  They  may  find  it 

best  to  choose  some  other  action,  somewhere  between  A"^  and  A",  that  yields  payoffs  greater  than 

three-way  mutual  defection.  If  they  cooperate  properly,  their  (mutual)  motivation  is  to  choose  an 

action  which  maximizes  their  joint  payoff  against  the  defecting  third-player   We  call  this  action  the 

implict  coalition  action.  A"".  For  a  three-player  game  with  player  3  as  the  defector,  it  is  defined  (for 

player  1)  as: 

P,{A\  A\  A3)  =  max  {P,(A.,  A„  A,)) 
A, 

In  the  example  above.  A""  =  13/9  =  1.444  for  any  third-player  action    In  general,  the  best 
coalition  price  will  depend  on  the  third  player's  action,  but  in  this  first  GPD  game  it  is  in  variant,  just 
like  A". 

In  an  A^-player  game  there  are  N-2  possible  coalition  sizes  corresponding  to  coalitions  of  2,  3, ...  , 
N-\  players.  (One  might  also  wish  to  define  A'^  and  A'^  as  coalition  prices  for  coalitions  of  1  and  N 
players,  respectively  ) 

Table  3  illustrates  the  impact  of  implicit  coalitions.  Notice  that  for  a  fixed  action  by  player  3,  the 
best  cooperative  response  by  players  1  and  2  is  always  A '^  Furthermore,  the  subgame  between 
players  1  and  2  is  itself  a  two-player  PD  where  cooperation  becomes  A""  while  defection  is  still  A"^. 


Abreu  (1986)  has  recently  proposed  a  class  of  strategies  known  as  "carrot  and  stick"  strategies  which  use  severe 
punishments  (as  low,  as  P'  and  even  lower!  as  a  credible  threat  to  enforce  maximally  collusive  behavior. 


Table  3:  Illustration  of  payoffs  when  actions  are  limited  to  A\ 
A",  and  A"^.  (The  first,  second,  and  third  lines  refer  to 
the  payoffs  to  players  1,2,  and  3,  respectively.) 


A,  =  A'       A,  =  A'       A,=A'       A,  =  A'' 


-A"      A,  =  A'' 


A,  =  A' 

A,  =  A'' 

A:  =  A' 

A,  =  A'' 

As  =  A'' 

A,  =  A' 

A3  =  A' 

20.00 
20.00 
20.00 

15.30 
15.30 
27.20 

11.45 
11  45 
29.25 

10.65 
22.44 
22  44 

6.83 
18.54 
24.46 

3.05 
20  54 
20.54 

As  =  A'' 

27.20 
15.30 
15.30 

22  44 
10.65 
2244 

18.54 

6.83 

2446 

17.72 
17.72 
17.72 

13.85 
1385 
19.72 

1001 
15.84 
15.84 

A3  =  A'' 

29.25 
11.45 
11.45 

2446 

683 

1854 

20.54 

305 

20  54 

1972 
13.85 
13.85 

1584 
10.01 

1584 

11.98 
11.98 
11.98 

Three  Potential  Strategies 

At  this  point  it  is  clear  that  there  may  be  some  motivation  for  implicit  coalitions  to  form.  We  have 
not  demonstrated  whether  or  not  it  is  advantageous  to  play  strategies  that  seek  to  form  coalitions  in 
repeated  games.  Nonetheless,  we  propose  three  strategies  for  the  repeated  GPD  that  recognize  and 
use  theconcept  of  implicit  coalitions.  The  first,  COALITION,  limits  actions  to  A^  A'',  anrf  A"*.  The 
second,  COALENC,  uses  the  continuous  nature  of  the  action  set  to  encourage  coalitions.  The  third, 
GENERIC,  is  a  generalization  of  the  first  two  and  proves  useful  when  we  describe  the  tournaments. 
Without  loss  of  generality,  we  continue  to  state  the  algorithms  from  the  perspective  of  player  1. 

COALITION  is  the  simplest  possible  implicit  coalition  strategy.  It  begins  each  game  at  A'.  In 
later  rounds  it  does  the  following: 


A/t)  = 


A' 

A" 

A' 


if  A^,A^^  A' 
COALITION:  A.(t)  =  {  A"  if  max{A^,  A^}  S  A'^and  min{A^.  A^}  <  A^ 

if  A„,  Aj  <  A"^ 

where  A  r<)  is  player  I's  action  in  round  t,  and  A  and  A  are  actions  in  round  t-1. 

COALENC  is  similar  to  COALITION,  except  that  it  recognizes  and  tries  to  take  advantage  of  the 
fully  continuous  nature  of  the  action  set: 


min{A„A„A^}      if  A„,A^  S  A" 


COALENC: 


A,(t)  =      A" 


if  max{A,.A,}  ^  A-'and  min{A,,  A,}  S  A" 


max{A„A„A«}     ifA„A,  SA"^ 


2'       3'  '  2'       3 

The  basic  idea  of  COALENC  is  to  maintain  total  cooperation  (at  A")  if  both  opponents  are  equally 
cooperative,  and  to  hedge  towards  the  implicit  coalition  price,  A"',  at  all  other  times. 


Finally,  we  acknowledge  that  more  complex  responses  are  possible  in  the  three  ranges  of 
competitors'  actions    Indeed,  such  complex  algorithms  were  entered  in  the  tournaments   We  define 
GENERIC  as  a  generic  implicit  coalition  strategy-  where  di    ),  and  /"^f    )  are  general  functions 
mapping  the  actions  in  t-1  (or  earlier)  onto  the  ranges  [A'^,  A"]  and  [A"",  A'],  respectively; 

h(A^.A^)  ifA^.A^SA^ 

GENERIC;  A/^)  =   j  A"  if  max(A^,A^}  S  A'and  min{A^,  A^}  ^  A" 

f,(A^,A)  ifA^.A^^A-^ 

THE  TOURNAMENT  APPROACH 

Given  the  importance  of  the  PD  and  its  extension,  the  GPD,  it  is  natural  to  try  to  find  a  "best" 
strateg>'  for  a  GPD  game  that  is  repeated  over  many  rounds.  (In  the  repeated  game,  we  assume  that 
the  payoffs  in  any  one  round  depend  only  on  the  actions  in  that  round,  but  each  player  can  observe  the 
previous  actions  by  their  competitors.)  Unfortunately,  as  Axelrod  (1981)  showed  for  the  classical  PD, 
there  is  no  single  best  strategy.  Against  different  sets  of  competitors,  different  strategies  may  be  best. 
For  example,  ALL-A''is  the  (unique)  best  response  to  a  pair  of  players  choosing  ALL-A^,  but 
COALITION  is  a  best  response  (although  not  unique)  if  competitors  are  known  to  be  playing 
COALITION 

Axelrod  ( 1980a, b)  pioneered  a  methodology  aimed  at  identifying  strategies  for  the  classical  PD 
that  perform  well  against  a  wide  range  of  competitors    In  order  to  generate  a  rich  environment, 
Axelrod  sponsored  a  contest,  inviting  game  theorists  to  submit  strategies  in  the  form  of  computer 
subroutines  for  a  repeated  PD  game.  Each  entry  "played"  every  other  entry  in  a  round  robin 
tournament.  The  objective  was  to  earn  the  highest  total  score  across  all  games.  Entries  could  be 
simple  or  complex;  some  participants  even  created  strategies  that  tried  to  identify  opponent's 
strategies  and  then  act  appropriately  against  them. 

The  winner  was  the  simplest  entry  of  all,  TIT  FOR  TAT,  which  starts  cooperatively  and  in  each 
subsequent  period  does  whatever  its  opponent  did  in  the  previous  round.  Axelrod  described  several 
key  properties  that  helped  distinguish  the  most  successful  strategies,  such  as  niceness.  which  means 
"never  be  the  first  to  defect." 

A  second  tournament  was  run  soon  after  the  first  tournament's  results  were  tallied    This  time 
Axelrod  received  62  entries  from  participants  representing  a  wide  range  of  ages,  disciplines,  and 
geographic  origins.  The  winner,  once  again,  was  TIT  FOR  TAT,  suggesting  that  its  first-round 
victory  was  no  fiuke 

The  second  tournament  reconfirmed  the  importance  of  niceness,  Axelrod  also  identified  pivotal 
properties  such  as  forgiveness  (i.e.,  do  not  be  too  severe  when  punishing  opposing  defections). 


provocability  (i.e..  never  let  an  opposing  defection  go  unacknowledged),  and  lack  of  envy  (i.e.,  do  not 
intentionally  try  to  reduce  competitors'  scores).  In  essence,  it  pays  to  cooperate. 

Axelrod's  tournaments  have  provoked  praise  and  criticism,  but  they  have  raised  a  number  of 
interesting  ideas.  We  seek  to  apply  the  tournament  methodology  to  study  implicit  coalitions  in  the 
GPD. 

MITCSl:  THEFIRSTGPDTOURNAMENT 

In  November  1984  we  announced  our  first  tournament  (named  MITCSl ,  for  MIT  Competitive 
Strategy  Tournament),  similar  in  design  to  Axelrod's  second  tournament,  but  featuring  the 
Generalized  Prisoner's  Dilemma.  The  game  was  posed  as  a  managerial  problem  with  price  as  the  sole 
strategic  variable.  Each  game  in  the  tourney  was  the  repeated  GPD  game  defined  in  equation  (1) 
with  three  programmed  strategies  choosing  actions  each  period  from  a  continuous  range    As  in 
Axelrod's  tournament,  each  possible  grouping  of  entries  engaged  in  five  repeated  games,  and  the 
overall  winner  was  the  strategy  that  amassed  the  highest  total  score  across  all  games  in  which  it 
participated.  Contestants  were  given  full  information  about  the  payoff  function  used,  and  in  every 
game  each  player  had  access  to  the  past  actions  of  all  three  players  in  that  game.  Entries  were 
submitted  in  the  form  of  FORTRAN  IV  subroutines. 

By  July  1985,  we  had  received  over  40  algorithms  (including  several  duplicates)  from  a  diverse 
group  of  participants  around  the  world.  The  field  of  entrants  included  economists,  political  scientists, 
game  theorists,  marketing  academics,  and  managers.  Several  universities  and  major  corporations 
submitted  the  best  and  most  creative  entries  they  had  found  after  running  their  own  mini- 
tournaments.  Thus,  the  pool  of  algorithms  available  for  this  first  empirical  analysis  of  the  GPD 
contains  a  wide  variety  of  creative  efforts  from  some  very  strategically-minded  people 

Description  of  Entries 

Many  entrants,  having  learned  Axelrod's  lessons,  attempted  to  generalize  TIT  FOR  TAT.  Six 
strategies  recognized  implicit  coalitions  and  incorporated  the  implicit  coalition  action,  A""  into  their 
algorithms.  We  label  these  algorithms  IC  for  implicit  coalition.  Several  IC  entries  fit  into  the 
GENERIC  framework,  including  some  that  used  very  complex  functions  for  fj  and  /2,  involving  many 
of  the  previous  decisions  of  each  competitor,  not  just  their  most  recent  actions 

Most  algorithms  tried  to  incorporate  continuous  alternatives,  but  participants  did  so  in  a  variety 
of  ways,  including: 

MIN:     Start  at  A'.  In  all  subsequent  rounds,  choose  the  minimum  of  your  competitors' 
actions  from  the  previous  round 

MAX:    Start  at  A'.  In  all  subsequent  rounds,  choose  the  maximum  of  vour  competitors' 
actions  from  the  previous  round 


AVG:     Start  at  A'.  In  all  subsequent  rounds,  choose  the  average  of  your  competitors' 
actions  from  the  previous  round. 

Each  of  these  strategies  leads  to  very  different  types  of  behavior    MAX  is  extremely  forgiving  but 
not  highly  provocable.  Two  MAXs,  playing  together  against  a  nasty  competitor,  will  remain  at  A^ 
ignoring  the  exploitative  moves  by  the  third  player   On  the  other  hand,  MIN  is  extremely 
competitive,  and  raises  its  action  only  if  both  competitors  do  so  first.  AVG  is  the  most  moderate  of  the 
three,  trying  to  balance  forgiveness  and  provocability  at  the  same  time.  (Axelrod  himself  entered 
AVG  into  the  tournament.) 

A  slightly  more  complicated  generalization  of  TIT  FOR  TAT  is  MXCM  (pronounced  MAXCUM) 
which  also  starts  at  A'',  but  in  later  rounds  it  mimics  the  previous  action  of  the  opposing  strategy  with 
the  greatest  cumulative  score  at  that  point  in  the  game.  Thus,  MXCM  does  not  attempt  to  use  the 
previous  actions  of  both  competitors  -  it  considers  only  the  stronger  of  the  two  (in  terms  of  cumulative 
score)  and  adopts  the  passive  mimicking  strategy,  just  like  TIT  FOR  TAT  in  the  2-player  game.  Like 
AVG,  MXCM  is  both  provocable  and  forgiving  since  it  follows  any  actions  made  by  the  leading 
strategy.  However,  MXCM  is  distinguished  from  AVG  because  it  is  able  to  ignore  inefTective 
strategies.  In  contrast,  AVG  will  always  give  equal  weight  to  the  actions  of  both  competitors, 
regardless  of  how  well  they  perform. 

Other  entrants  made  no  attempt  to  generalize  TIT  FOR  TAT.  Some  used  different  variants  of  a 
strategy  suggested  by  Friedman  (1971)  that  begins  each  game  at  A'  and  stays  there  until  any 
competitor  defects,  in  which  case  it  goes  to  A''  and  stays  there  for  all  subsequent  rounds.  We  label  this 
type  of  strategy  XTRM. 

A  few  participants  chose  constant  strategies  (eg,  always  cooperate  rALL-A'j,  always  defect 
CALL-A''^,  or  always  be  envious  (ALL-A')),  or  random  (RND)  strategies  which  chose  actions  randomly 
from  the  range  [A'',  A']  or  used  a  random  walk  technique    Hence  most  of  the  algorithms  could  be 
classified  into  one  of  eight  broad  categories:  MIN,  MAX,  AVG,  MXCM,  IC,  XTRM,  constant  action,  or 
RND. 

Beyond  these  general  descriptions  of  strategy  types,  the  entries  differed  due  to  specific  tactics  or 
features  that  were  frequently  employed.  For  example: 

Following  Axelrod  (1984),  strategies  which  start  the  game  cooperatively  and  are  never  the  first  to 
cut  action  below  A'  are  termed  nice,  as  opposed  to  nasty  strategies  ,  which  can  be  the  first  to  defect. 

Self-awareness  allows  strategies  to  consider  the  previous  decisions  of  all  three  players  (not  just 
the  two  competitors)  when  choosing  actions.  This  feature  tended  to  reduce  cycling  and  echo  effects. 

Many  strategies  restricted  their  actions  to  the  range  [A'',  A'],  since  there  is  no  way  to  increase 
payoffs  by  choosing  actions  outside  of  this  range.  Strategies  that  were  willing  to  go  below  A'' or  above 
A'  are  known  as  unbounded 

Some  strategies  tried  to  induce  cooperation  by  raising  actions  slightly  above  the  level  specified  by 
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a  general  strategy  type  (eg  play  MIN  but  add  a  few  cents  to  the  minimum  action).  These  strategies 
have  action-raising  initiati%'e 

On  the  other  hand,  some  strategies  have  action-cutting  initiative,  that  is,  they  were  willing  to  go 
below  the  specified  action  level,  usually  in  an  attempt  to  punish  an  earlier  cut  made  by  a  competitor. 

Finally,  several  strategies  occasionally  used  the  envious  action  A'  to  try  to  outscore  their 
competitors  (rather  than  maximizing  their  own  score). 

These  descriptions  are  admittedly  vague.  The  actual  implementation  of  some  of  these  features 
can  vary  greatly  from  strategy  to  strategy.  For  instance,  action-raising  algorithms  can  vary  the 
magnitude  and  frequency  of  their  increases.  For  ease  of  exposition,  we  ignore  fine-grain  difTerences 
among  algorithms,  since  the  mere  presence  of  a  particular  feature  was  generally  more  important 
than  the  manner  in  which  it  was  implemented. 

MITCSl  RESULTS  AND  INTERPRETATION 

Table  4  presents  a  summary  of  the  strategies  and  their  performances.  The  strategies  are  ranked 
by  their  average  score  per  round.  For  comparison,  mutual  cooperation  pays  20  units  per  period  to 
each  player,  while  each  period  of  mutual  defection  (i.e.  all  three  players  choosing  A'')  yields 
approximately  12  units  to  each  player.  (See  Table  3  for  additional  payoff  comparisons.) 

The  winning  algorithm,  entered  by  Terry  Elrod  of  Vanderbilt  University,  was  the  simplest 
possible  IC  strategy,  COALITION. 

Recognizing  implicit  coalitions  proved  to  be  the  single  most  important  factor  in  the  tournament. 
The  top  four  algorithms  in  the  tournament  recognized  the  coalition  property,  and  all  six  IC  strategies 
finished  in  the  top  ten  overall.  Another  interesting  factor  was  how  the  strategies  dealt  with  the 
continuous  nature  of  the  actions.  Most  entrants  used  simple  heuristics  (eg  MIN,  MAX,  AVG)  to 
address  this  problem  with  varying  degrees  of  success.  The  standard  averaging  strategy  (i.e.  nice, 
bounded  AVG  with  no  self-awareness,  envy,  or  action-raising/cutting  initiative)  finished  in  6'"'  place, 
easily  beating  standard  MAX  (ranked  W^)  and  standard  MIN  (12'"). 

Several  of  the  descriptive  features  were  highly  influential.  First  and  foremost  is  niceness, 
reconfirming  the  findings  of  Axelrod.  Nice  algorithms  were  able  to  reap  great  benefits  by  avoiding 
the  short-term  temptation  to  defect.  The  best  nasty  strategy  played  standard  AVG  most  of  the  time, 
but  would  occasionally  make  small  cuts  as  long  as  both  competitors  remained  at  A'.  If  either 
competitor  responded  to  these  cuts,  this  strateg>'  would  return  to  standard  AVG  for  the  remainder  of 
the  game.  This  clever  form  of  exploitation  helped  make  this  algorithm  far  more  successful  than  other 
nasty  entries,  but  still  could  not  provide  any  better  than  a  21-  place  finish. 

Other  important  features  were  boundedness  and  lack  of  envy.  Only  one  successful  algorithm  ever 
exhibited  envious  behavior,  but  that  strategy  (ranked  2-)  would  only  go  to  A'  if  both  competitors  were 
ator  below  A'in  the  previous  round,  a  fairly  rare  occurrence    Perhaps  if  this  second-ranked  strategy 
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Table  4    Official  MlTCSl  Results. 


Rank 

Entrant 

Strategy 
Tvpe 

Features ' 

Average 

Score 
per  Round 

1 

Terry  Elrod 

IC 

17  182 

2 

(anonymous) 

IC 

SU  R 

E 

17.172 

3 

Avraham  Beja  &  Shlomo  Kalish 

IC 

S 

C 

17.172 

4 

Steve  Shugan 

IC 

17.157 

5 

(MIT)  = 

AVG^ 

S 

17.157 

6 

Gary  A  Lines 

AVG 

17.104 

7 

(MIT) 

MIN 

S 

R 

17.063 

8 

Steve  Shugan 

IC 

S 

17.014 

9 

Beja  &  Kalish 

IC 

s 

C 

16927 

10 

(MIT) 

MXCM 

s 

16.914 

11 

Terry  Elrod;  John  Roberts 

MAX 

16879 

12 

Gary  Gaeth  &  Gerard  Teliis; 
Terry  Elrod,  Gary  A.  Lines 

MIN 

16851 

13 

James  M  Lattin 

MAX 

s 

R 

16.830 

14 

John  A  Cadley 

XTRM 

R 

16720 

15 

Steve  Borgatti 

MIN 

s 

16682 

16 

Steve  Borgatti 

MIN 

C 

16523 

17 

John  A  Cadley 

XTRM 

16519 

18 

(MIT) 

(41 

u 

16.389 

19 

Robert  Axelrod 

AVG 

u 

16.335 

20 

John  Roberts 

AVG^ 

u 

16.305 

21 

Barbara  Bruner  &  James  Giver 

AVG 

N 

s 

c 

16.127 

22 

Robert  F.  Bordley 

MIN 

u 

15976 

23 

Robert  E.Marks 

XTRM  ^ 

u 

E 

14.535 

24 

Robert  E.  Marks 

XTRM 

u 

E 

14497 

25 

James  M  Lattin 

ALLA^ 

14351 

26 

Shlomo  Maital 

MXCM 

N 

u 

13944 

27 

Beja  &  Kalish 

RND" 

N 

s 

13809 

28 

Robert  E.  Marks 

(8) 

N 

u 

E 

13.763 

29 

Steve  Borgatti 

MIN 

N 

s 

c 

13.740 

30 

Roland  Rust;  Robert  F.  Bordley; 
John  Roberts 

ALL  A'' 

N 

13.637 

31 

Beja  &  Kalish 

RND' 

N 

s 

13575 

32 

(MIT) 

RND^ 

N 

13.447 

33 

Shlomo  Maital 

MXCM 

N 

u 

13.384 

34 

Robert  F.  Bordley 

MIN 

N 

u 

12.918 

35 

Kenneth  L.  Stott,  Jr  , 

Francis  J.  Vasko,  &  Floyd  E.  Wolf  ALL  A' 

N 

SU 

R 

E 

12  151 

36 

Robert  E.Marks 

ALL  A' 

N 

u 

E 

9909 

37 

(anonymous) 

MIN'"" 

N 

u 

R 

9684 

38 

(anonymous) 

AVG^^ 

u 

R  C 

9.643 

(see  Notes  to  Table  4,  next  page) 
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Notes  to  Table  4: 

(1)  Default  features  include  niceness,  no  self-awareness,  bounded  actions,  no  action-raising  or  -cutting  initiative,  and  no 
en\-y .  Exceptions  are  noted  as  N  for  nastiness,  S  for  self-awareness,  U  for  unbounded  actions,  R  for  action-raising 
initiative,  C  for  action-cutting  initiative,  and  E  for  envy. 

(2)  (MIT)  denotes  an  algorithm  entered  by  a  member  of  the  MIT  community  which  was  not  eligible  to  win  the  tournament. 
Post-tournament  testing  indicates  that  the  inclusion  of  these  entries  does  not  affect  the  ordering  of  the  top  algorithms. 

(3)  Weighted  average  of  all  three  players' actions,  using  cumulative  scores  as  weights. 

(4)  Mimics  previous  move  of  one  opponent  on  odd  rounds,  other  opponent  on  even  rounds. 

(5)  Geometric  mean  of  opponents'  previous  actions. 

(6)  Stays  at  A'  for  two  opposing  defections  before  going  to  A'. 

(7)  Random  walk  centered  around  i^A'^-f  A  ). 

(8)  Mimics  actions  of  one  opponent  chosen  at  random  at  start  of  game;  actions  limited  to  range  XA'.A"^]. 

(9)  Uniform  random  variable  between  A    andA^. 

(10)  Only  algorithm  to  introduce  action  increases  above  A^ . 

(11)  Only  algorithm  to  introduce  action  cuts  below  A'. 


did  not  try  to  battle  envious  competitors  on  their  terms,  it  might  have  been  able  to  win  the 
tournament. 

The  value  of  bounded  actions  can  be  seen  by  comparing  standard  AVG  and  MIN  (ranked  6-  and 
12-,  respectively)  to  their  equivalent  but  unbounded  counterparts  (ranked  19-  and  22—, 
respectively).  Boundedness  was  worth  nearly  0.70  units  per  round  to  AVG  and  nearly  0.90  units  per 
round  to  MIN. 

Several  entrants  found  self-awareness  to  be  a  blessing   For  example,  some  strategies,  unlike  TIT 
FOR  TAT,  considered  their  own  previous  decisions  in  determining  future  actions,  e.g.  averaging 
across  all  three  players  (ranked  5-),  and  3-player  MXCM  (10-  place).  But  self-awareness  was  a  curse 
to  others,  including  those  who  used  it  as  a  ratchet  on  actions.  The  algorithms  ranked  8-  and  15-,  for 
example,  only  let  their  actions  move  downwards,  regardless  of  the  cooperative  gestures  made  by  their 
competitors. 

Little  can  be  said  about  the  effectiveness  of  action-raising  and  action-cutting  initiative.  Some  of 
the  nice,  bounded  entries  were  able  to  encourage  cooperation  and  discourage  cheating  with 
appropriate  rewards  and  penalties,  but  these  successes  were  counterbalanced  by  the  unsuccessful 
strategies  that  brought  on  their  own  demise  by  raising  or  cutting  actions  too  much  at  the  wrong 
times. 

Table  4  seems  to  depict  a  tight  three-way  battle  for  first  place.  However,  it  should  be  noted  that 
each  algorithm  played  in  nearly  1000  three-player  matches  in  each  of  the  five  games  in  the 
tournament.  This  information,  combined  with  the  fact  that  each  match  lasted  approximately  200 
rounds,  implies  that  each  strategy  chose  actions  in  nearly  1  million  total  rounds.  Thus,  a  difference  of 
.01  units  on  a  payoff-per-round  basis  is  equivalent  to  a  10,000  unit  difference  in  total  score. 

An  Alternative  Champion 

The  winning  algorithm,  COALITION,  was  the  only  highly-ranked  strategy  that  did  not 
acknowledge  the  continuity  of  actions.  Apparently,  none  of  its  top  rivals  could  use  the  continuous  of 
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Old 
Rank 

Strateg>' 
Type 

Feat 

:ures'' 

Average  Prof 
"        per  Round 

— 

IC 

17.353 

1 

IC 

17.257 

2 

IC 

SU 

R 

E          17.249 

3 

IC 

S 

C 

17.245 

4 

IC 

17.235 

5 

AVG^ 

S 

17.225 

6 

AVG 

17.175 

7 

MIN 

S 

R 

17.136 

8 

IC 

S 

17.101 

9 

IC 

S 

C 

17.000 

the  action  space  enough  to  overcome  the  winner's  discrete  simplicity.  However,  this  does  not  imply 
that  the  task  is  impossible;  COALEN'C,  described  earlier,  would  have  easily  won  the  tournament  had 
it  been  entered. 

Table  5  shows  the  top  ten  entries  in  the  revised  tournament  with  COALEN'C  included.  Note  that 
the  relative  rankings  of  the  original  strategies  are  unchanged,  although  average  scores  have 
increased  because  of  the  presence  of  the  cooperative  newcomer.  The  margin  of  victory  for  the  new 
algorithm  is  quite  significant,  the  gap  between  first  and  second  place  is  larger  than  the  margin 
between  second  and  seventh  place. 

Tabled:  Revised MITCSl  Results. 

New 

Rank      Entrant 

1  COALENC 

2  Terry  Elrod 

3  (anonymous) 

4  Avraham  Beja  &  Shlomo  Kalish     3 

5  Steve  Shugan 

6  (MIT)^^' 

7  Gary  A  Lines 

8  (MIT) 

9  Steve  Shugan 
10  Beja  &  Kalish 

NOTES: 

( 1 )  Default  features  include  niceness,  no  self-awareness,  bounded  actions,  no  action-raising  or  -cutting  initiative,  and  no 
envy.  Exceptions  are  noted  as  N  for  nastiness.  S  for  self-awareness,  U  for  unbounded  actions,  R  for  action-raising 
initiative,  C  for  action-cutting  initiative,  and  E  for  envy. 

(2)  (MIT)  denotes  an  algorithm  entered  by  a  member  of  the  MIT  community. 

(3)  Weighted  average  of  all  three  players' actions,  using  cumulative  scores  as  weights 

More  importantly,  the  success  of  this  algorithm  is  not  very  sensitive  to  variations  in  the 
competitive  environment.  Extreme  changes,  such  as  doubling  the  presence  of  all  nasty  entries, 
usually  can  not  unseat  this  new  winner    Many  of  the  procedures  that  Axelrod  used  to  demonstrate 
the  robustness  of  TIT  FOR  TAT  have  been  applied  to  this  tournament,  with  strong  results  favoring 
COALENC. 

MITCS2:  THE  SECOND  GPD  TOURNAMENT 

One  of  the  unique  aspects  of  the  payofT  function  used  in  MITCSl  is  separability,  which  leads  to 
unique,  invariant  values  for  A'' and  A"".  Because  the  implicit  coalition  action  never  changes,  it  is 
relatively  easy  for  coalition-seeking  algorithms  to  achieve  their  goal.  In  more  general  situations,  the 
best  action  for  a  coalition  should  depend  on  the  actions  of  non-coalition  players.  For  example,  the 
coalition  response  to  an  envious  player  might  be  harsher  (i.e.,  lower  coalition  action)  than  the 
coalition  response  to  a  small  defection. 
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With  this  in  mind,  we  sought  to  determine  whether  the  success  of  COALITION  and  COALEN'C 
was  unique  to  the  payofTfunction  and  competitive  environment  of  MITCSl,  or  whether  it  could  be 
replicated  in  an  environment  that  is  potentially  less  favorable  to  implicit  coalitions. 

Soon  after  we  completed  the  analysis  of  MITCSl,  we  announced  a  second  tournament,  MITCS2, 
with  the  following  payofTfunction: 

n,  =  200{8-6A^  +  A^  +  A^){A^-l)-l80.  (2) 

Equation  (2)  corresponds  to  a  linear  demand  function  in  economics. 

As  before,  payoffs  are  symmetric  and  the  equation  satisfies  the  GPD  conditions.  The  scaling 
constants  were  chosen  to  closely  match  the  payoffs  in  MITCSl ;  full  cooperation  iA^  =  A^  =  A^  =  A') 
pays  20  units  per  player  per  round,  and  full  defection  (A^  =  A^  =  Aj  =  A'^)  pays  12  units  per  player  per 
round. 

The  key  difference  between  MITCSl  and  MITCS2  is  that  the  short-term  payoff-maximizing 
action  (A'')  and  the  implicit  coalition  action  (A"")  now  depend  on  competitors'  actions.  Specifically,  for 
fixed  competitive  actions, 


and 


14  +  A  +A 

Af^ (3) 

^  12 


13  +  A„ 
A  "•  _  A  'c :;  (A) 


For  example,  if  player  3  chooses  ALL- 1.40,  then  A  ''  =  A  "=  1.440.  However,  if  player  3  chooses 
ALL-A-*  via  equation  (3),  then  A/  =  1.407,  and  Aj''  =  A/=  1.441. 

All  entrants  were  aware  of  MITCSl  and  the  success  of  COALITION  and  COALENC.  Each  subset 
of  three  entries  was  matched  for  five  games  of  200  rounds^  and  the  winner  was  the  strategy  with  the 
highest  total  (or  average)  payoffs 

Tournament  Results 

By  fall  of  1986,  32  entries  had  been  submitted  to  MITCS2.  Five  strategies  were  thrown  out  due  to 
coding  errors  or  illegal  tactics.  The  remaining  27  entries  were  combined  with  1 1  strategies  carried 
over  (some  with  slight  modifications)  from  MITCSl.  These  strategies  were  included  again  because 
they  led  to  interesting  pricing  behavior  in  the  original  tournament.  Finally,  suggestions  from  other 
individuals  who  did  not  wish  to  officially  participate  led  to  6  more  submissions,  thus  rounding  out  the 
field  of  44  unique  entries. 

A  brief  description  of  each  entry  is  shown  in  Table  6,  where  the  entries  are  ranked  by  average 
scores  per  round. 


Since  no  entries  used  any  explicit  end-game  maneuvers,  the  game  length  was  fixed  at  200  rounds  for  all  games. 
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Table  6.    MITCS2  Official  Results  (see  Notes  to  Table  6,  next  page) 


Rank 

1 
2 


3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 


Entrant 

Robert  E.  Marks 
Robert  L.  Bishop; 
Tony  Haig 
Paul  R.  Pudaite 
John  Hulland 
Neil  Bergmann 
Tony  Haig 
James  M.  Lattin 
(MITCSl  #7)' 
Scott  A.  Neslin 
Scott  A.  Neslin 

4 

Robert  L.  Bishop 

(MITCSl #5) 

Karel  Xajman 
(MITCSl #6i 
James  M.  Lattin 
(MITCSl  #11) 
Karel  Najman 
Terry  Elrod 
James  M  Laltm 
(MITCSl #21) 
Neil  Bergmann 
Chris  Jones 
(MITCSl #10) 
Karel  Najman 
(MITCSl  #17) 


Strategy      N=      Lower   Mean  Score 
Type       Nasty  Bound    per  Round 


James  M.  Lattin 
Karel  Najman 
(MITCSl  #26) 

(MITCSl #35) 
Pel*r  J.  Brock 
Paul  R.  Pudaite 
Karel  Najman 
Paul  R  Pudaite 
Peter  J.  Brock 
James  M.  Lattin 
Peter  J.  Brock 
(MITCSl  #32) 
(MITCSl #36) 


COALENC 

COALEXC 

COALENC 

COALENC 

COALENC 

COALENC 

COALENC 

MIN.R 

AVG.S 

AVG.S 

MXCM.S 

COALENC 

AVG 

AVG.S 

AVG.S 

AVG.S 

AVG 

MAX 

AVG 

COALITION 

MIN 

COALENC 

COALITION 

MIN 

MXCM.S 

MIN 

XTRM 

AVG 

AVG 

MAX 

ALL- 1  44 

MXCM 

XTRM 

ALL-A'^.R 

AVG 

ALL-A*^ 

ALL-A'' 

ALL-A"* 

ALL1.39 

ALL-A-^.C 

RANDOM 

ALLA' 


N 


N 
N 
N 
N 

N 

N 
N 
N 
N 
N 
N 
N 
N 
N 
N 


1.4 
1,4 
1.4 


A" 

1.4 
1.4 


1.4 

Ad 


1.4 

1.4 
1.4 


1.4 

Ad 


1.4 
1.44 

a' 

1.4 
1.4 
1.4 


A" 
1.39 

1.333 


17.097 

17.096 
17.091 
17.085 
17.084 
17.075 
17.065 
17.064 
17.063 
17.042 
17.025 
16.993 
16.989 
16.970 
16.968 
16.964 
16.932 
16.932 
16.926 
16.908 
16.907 
16.887 
16.765 
16.754 
16.707 
16.679 
16.567 
16.187 
16.162 
15.992 
15.776 
15.719 
15.264 
15.226 
14.512 
14.126 
14.033 
13.979 
13.854 
13.734 
13.213 
13.089 
12.817 
11.754 


Description 

(26+A^  +  Aj)/20 


Original  COALENC  (with  A"  =  13/9) 

A""  =  (13+A)/10;  looks  back  two  rounds 

A"  =  (13+A^)/10 

A"^  =  (13+Aj)/10 

A"'  =  (13 +A^)/10;  looks  back  two  rounds 

A"  =  1.44 

Standard  MIN  with  random  2e  price  increases 

Linear  learning  model:  complex  averaging  procedure 

Variation  of  #9  above  (i.e.,  different  parameters) 

Mimics  previous  price  of  second-best  firm 

A""  =  13/9;  uses  max{A^,(  13 +A^V10}  when  A^  <  A" 

Gradually  shifts  from  MAX  to  MIN  as  game  progresses 

Weighted  average  of  all  3  players'  previous  prices 

Unweighted  average  of  all  3  players'  previous  3  prices 

Unweighted  average  of  all  3  players'  previous  prices 

Standard  AVG:  average  of  opponents'  previous  prices 

Complex  adaptive  learning  model 

Standard  MAX:  maximum  of  opponents'  previous  prices 

Unbounded  AVG 

Same  as  official  winner  of  MITCSl  but  with  A"^  =  85/59 

Plays  AVG  in  round  2,  MIN  thereafter 

Modified  version  of  top  nasty  entry  in  MITCSl 

COALITION  with  varying  A";  A"  =  (13+Aj)/10 

Standard  MIN:  minimum  of  opponents'  previous  prices 

Mimic  previous  price  of  best  (most  profitable!  firm 

Unbounded  MIN 

Play  A'^  until  anyone  cuts  price;  play  A    thereafter 

Start  at  1 .4  then  play  AVG 

Weighted  AVG  with  random  weights 

Start  at  1  4  then  play  MAX 

Always  choose  1.44 

Start  at  1 .4  then  mimic  previous  price  of  best  opponent 

Choose  1 .4  for  2  rounds  after  a  price  cut.  then  return  to  A' 

Raise  price  above  A    if  profits  exceed  opponents'  profits 

Start  at  1.4  then  choose  average  of  A    and  lA   +  A  >/2 

ChooseA    =  (6  +  A+A)/6  to  maximizejoint  profits 

Start  at  1  5;  thereafter  choose  A 

SUrt  at  17/1 2:  thereafter  choose  A'^ 

Start  at  1 .4;  thereafter  choose  A 

Always  choose  1 .39 

ChooseA    =  (7A  -A  )/6  to  hurt  non-ALL-A    players 

Uniform  random  variable  between  1 .333  and  1 .5 

Act  enviously,  i.e.  maximize  share  of  industry  profits 
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Notes  to  Table  6: 

'  The  basic  strat«g>'  types  are  defined  in  the  descriptions  and  in  the  text.  The  suffix  "S"  refers  to  each  strategy  with  self 
awareness,  the  "R"  refers  to  strategies  with  action-raising  initiative,  and  "C"  refers  to  action-cutting  initiative. 

Firm  3  is  assumed  to  be  the  less  cooperative  opponent  of  firm  1 . 

This  denotes  a  strateg\'  that  was  an  official  entry  in  MITCSl .  The  number  refers  to  its  ranking  in  that  tournament. 

This  denotes  a  strategy'  based  on  an  informal  suggestion 


Table  6  shows  two  striking  patterns.  Most  of  the  COALENC  generalizations  cluster  towards  the 
top,  and  15  out  of  the  16  bottom  entries  are  nasty  (i.e.,  willing  to  initiate  defections).  One  pattern  that 
is  not  immediately  obvious,  however,  is  the  possible  link  between  the  success  of  the  COALENC 
entries  and  the  method  of  choosing  a  value  of  A'". 

The  winning  entry,  submitted  by  Robert  Marks  of  the  Australian  Graduate  School  of 
Management,  features  an  unusual  type  of  coalition  action.  It  uses  (4)  to  calculate  an  A""  against 
player  3  and  averages  this  action  with  an  A""  calculated  against  player  2.  Thus,  for  instance,  if  A^  = 
1.50  andA^  =  1.40,  this  algorithm  would  act  like  COALENC  with  A''  =  (1.44  +  1.45)/2  =  1.445, as 
compared  to  a  A"  of  1.44  that  (4)  would  suggest  (and  most  COALENC  entries  would  use). 

At  first  glance  this  may  seem  like  an  inefficient  rule,  since  it  will  often  lead  to  coalitions  with  a 
action  slightly  above  the  "optimal"  A".  But  notice  which  routine  came  in  a  close  second;  the  original 
version  of  COALENC  with  A"  -  13/9  =  1.444...  .  This  is  also  a  relatively  high  coalition  action,  it  will 
exceed  the  A"  suggested  by  (4)  whenever  the  non-cooperative  player  is  below  1.444...  .  A  pattern 
emerges:  the  top  two  strategies  consistently  choose  higher  coalition  actions  than  any  of  the  other 
COALENC  entries.  As  further  evidence,  note  that  the  "worst  of  the  best,"  entry  #7,  will  generally 
choose  the  lowest  A"",  1 .440. 

In  the  next  sub-section  we  examine  this  high  A''  phenomenon  further,  but  now  we  briefly 
summarize  some  of  the  other  results  of  interest.  First,  notice  the  rather  mediocre  performance  of  the 
entries  that  attempt  to  generalize  COALITION,  as  compared  to  its  sterling  performance  in  MITCSl. 
Part  of  this  drop  can  be  attributed  to  the  different  mix  of  strategies  in  MITCS2  compared  to  MITCSl: 
with  the  presence  of  more  sophisticated  entries  (such  as  the  COALENC  generalizations),  the  discrete 
pricing  policy  begins  to  hurt  COALITION.  This  is  particularly  true  when  action-cutting  exists  at 
moderate  levels.  But  much  of  COALITION'S  drop  is  due  to  the  new  payoff  function:  without  a  fixed 
A"  to  rely  on,  any  coalition-seeker  must  be  more  flexible  and  forgiving  in  trying  to  establish  a 
successful  coalition. 

Another  prominent  result  from  MITCSl  was  the  need  for  a  lower  bound  on  one's  actions.  Most 
entrants  to  MITCS2  recognized  this  idea  and  used  one  of  two  lower  bounds  -  fixed  at  1.40  or  floating 
(A"^).  The  results  in  Table  6  show  no  significant  advantage  for  one  method  or  the  other.  For  example, 
entries  #4  and  #5  are  exactly  the  same  except  for  their  lower  bounds,  and  in  each  of  the  five 
constituent  games  in  the  tourney,  these  entries  finish  with  nearly  identical  scores.  This  finding 
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should  not  be  considered  as  all  too  surprising,  after  all,  when  action  cutting  is  severe  enough  to 
require  bounded  actions,  A'^  is  usually  quite  close  to  1 .40  anyway 

Finally  another  result  worth  mentioning  is  the  relative  performance  of  three  standard 
algorithms.  In  MITCS2,  just  as  in  MITCSl,  AVG  (entry  #17)  earns  higher  payoffs  than  MAX  (#19), 
and  both  beat  out  MIX  (#  25).  The  value  of  having  bounded  actions  can  be  seen  once  again  by 
comparing  entry  #17  to  #20  and  #25  to  #27.  Boundedness  does  not  appear  to  be  as  valuable  as  in 
MITCSl,  but  this  is  only  because  of  the  smaller  number  of  extreme  action  cutters.  Only  three  entries 
(#41,  #43,  and  #44)  ever  initiate  cuts  below  1.40. 

A  New  Alternative  Champion 

MITCS2  confirms  the  importance  of  the  implicit  coalition  phenomenon,  but  suggests  that 
algorithms  can  be  fine-tuned  to  achieve  greater  payoffs   In  fact,  the  higher  the  target  coalition  action, 
the  better  the  algorithm  seems  to  perform.  We  call  this  new  property  magnanimity. 

The  success  of  the  magnanimous  entries  seems  to  result  from  the  fact  that  a  high  A"  is  less  likely 
to  be  viewed  as  a  non-cooperative  action.  In  contrast,  a  less  magnanimous  algorithm,  e.g.,  #7,  often 
will  be  mistaken  for  a  defector    Matchups  between  #7  and  discrete  COALITION  strategies  with 
higher  A"  s  will  quickly  degenerate  into  (A'^,  A"^,  A'*)  behavior  because  the  two  potential  cooperators 
can  not  agree  on  a  coalition  action    Of  course,  there  is  a  limit  to  magnanimity;  too  high  a  coalition 
action  allows  an  algorithm  to  be  exploited. 

To  generate  a  slightly  more  magnanimous  strategy,  we  included  another  potentially  beneficial 
property,  self-awareness,  into  the  A"  calculation.  If  each  player  incorporates  its  own  previous  action 
in  determining  what  A"  to  choose,  the  resulting  coalition  action  will  tend  to  be  higher  and  more 
stable.  No  longer  would  different  subsets  of  players  seek  different  coalition  actions,  environments 
with  general  payoff  functions  would  become  more  likethe  MITCSl  world,  where  stable  coalitions  are 
easily  established  and  maintained. 

Our  new  strategy,  named  CE  A  VG3  (for  coalition  encourager  based  on  the  average  of  all  3 
coalition  actions),  is  still  an  COALENC  strategy,  only  the  coalition  reaction  function  is  different   In 
CEAVG3  instead  of  calculating  and  averaging  our  A"  against  players  2  and  3,  we  perform  the  same 
task  with  respect  to  aU  three  players   The  new  coalition  reaction  function,  therefore,  is: 

39-^A,-(-A„-^A, 

A/^=  i 1 1  (5) 

^  30 

Although  the  coalition  actions  and  payoffs  for  the  new  strategy  are  only  slightly  higher  than 

those  of  entry  #1,  this  small  increase  combined  with  the  moderating  influence  of  the  lagged  A^  term 

helps  the  new  strategy  to  achieve  a  first-place  finish  when  placed  among  the  MITCS2  entries.  Table  7 

shows  the  revised  payoff  figures   (Only  the  top  five  entries  are  shown;  the  overall  standings  are 

barely  affected  by  the  presence  of  the  new  strategy.) 
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Strategy 

N  = 

Lower 

Mean  Score 

Rank 

Entrant 

Type 

Nasty 

Bound 

per  Round 

lA 

CEAVG3 

COALENC 

a' 

17.170 

1 

Robert  E.Marks 

COALENC 

a' 

17.163 

2 

Robert  L.  Bishop; 

Tony  Haig 

COALENC 

1.4 

17.160 

3 

Paul  Pudaite 

COALENC 

1.4 

17.156 

4 

John  Hulland 

COALENC 

1.4 

17.149 

Table  7:   Revised  MITCS2  Results 

Description 

A''  =  (39  +  Aj+A^  +  Aj)/30 
A"  =  (26  +  A^  +  Aj)/20 

Original  COALENC  (with  A''  =  13/9) 
A"'=  (13 +A^)/10;  looks  back  two  rounds 
A""  =  (13+Aj)/10 

Since  CEAVG3  is  an  COALENC  strategy,  its  behavior  (and  payofTs)  will  often  be 
indistinguishable  from  the  other  COALENC's.  However,  in  the  cases  where  these  entries  do  differ, 
CEAVG3  does  well  enough  to  win  the  revised  MITCS2  tournament  by  a  relatively  comfortable 
margin. 

Test  of  Robustness 

No  tournament  can  tell  us  which  single  strategy  is  truly  "best,"  or  which  set  of  strategies  will  do 
well  in  the  widest  set  of  environments.  But  a  series  of  tournaments  coupled  with  some  reasoning  can 
raise  some  valid  hypotheses  and  insights. 

COALENC  strategies  did  well  in  both  MITCSl  and  MITCS2,  but  these  tournaments  represent 
relatively  "nice"  environments.  To  determine  the  sensitivity  of  COALENC  strategies  to 
environments,  we  performed  a  test  of  robustness,  similar  to  the  post-tournament  work  of  Axelrod 
(1984)  We  generated  200  new  environments  using  different  combinations  of  the  MITCS2  entries.  We 
first  used  a  stepwise  procedure  to  identify  a  subset  of  eight  representative  entries  that  faithfully 
reproduce  the  overall  payoffs  and  standings  of  .MITCS2,  using  only  a  small  fraction  of  the  full 
tournament.  The  eight  representatives  (#  7, 16,  20,  23,28,  33,  37,  and  41)  form  an  environment 
involving  36  games  with  each  of  the  MITCS2  entries,  but  yield  overall  average  payoffs  that  have  a 
correlation  coefficient  of  99.4%  with  the  scores  from  the  full  tournament  (5175  games  per  entry). 

To  generate  each  simulated  environment,  we  took  random  combinations  of  each  of  the  eight 
representatives,  also  accounting  for  the  residuals  between  the  actual  and  mini-tournament  payoffs. 
This  procedure  was  repeated  200  times,  thereby  producing  a  wide  range  of  environments. 

As  a  proxy  for  the  niceness  or  nastiness  of  each  environment,  we  use  the  average  payoffs  across  all 
45  entries.  The  200  environments  are  sorted  by  this  index  and  broken  into  five  equal-sized  groups. 
We  ranked  the  score  for  each  strategy  within  each  group    Table  8  summarizes  the  results  by  giving 
the  top  ten  finishers  in  each  environment    For  ease  of  reference,  the  basic  strategy  types  are  shown 
below  for  each  listed  entry    (The  entry  numbers  refer  to  MITCS2  rankings  ) 
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Table  8:  Strategy  Performance  in  Simulated  Environments 


entry 

score 

entry      score 

entry      score 

entry 

score 

entry 

score 

11 

16.186 

lA      16866 

lA      17.364 

lA 

17833 

12 

18.867 

lA 

16.162 

2      16861 

2     17.357 

1 

17.826 

21 

18821 

2 

16.159 

1      16860 

1      17.356 

2 

17.825 

lA 

18780 

1 

16  155 

3      16  859 

3      17.352 

3 

17  822 

2 

18774 

3 

16.150 

4      16853 

9      17.351 

5 

17.817 

1 

18774 

4 

16.144 

5      16852 

4      17.346 

4 

17.816 

11 

18.771 

5 

16  143 

6      16  841 

5      17.346 

8 

17809 

3 

18766 

6 

16.132 

7      16839 

10      17.343 

6 

17.806 

5 

18.762 

7 

16.106 

8      16819 

8      17.337 

9 

17.804 

4 

18.761 

9 

16  105 

9      16818 

6      17.335 

7 

17  800 

24 

18757 

Nastiest 

Moderately 

Mid-range 

Moderately 

' w- 

Nicest 

Nasty 

Nice 

entry 

type 

entry 

type 

entry 

type 

lA 

COALENC 

5 

COALENC 

10 

AVGS 

1 

COALENC 

6 

COALENC 

11 

MXCM  S 

2 

COALEXC 

7 

COALENC 

12 

COALENC 

3 

COALENC 

8 

MLN.R 

21 

COALITION 

4 

COALENC 

9 

AVGS 

24 

COALITION 

Table  8  shows  a  clear,  consistent  pattern  supporting  the  results  of  Tables  6  and  7.  The  top 
strategies  are  very  stable  in  moderate  environments,  and  fall  only  slightly  in  more  extreme  cases.  It 
is  encouraging  to  see  that  the  COALENC  entries  perform  so  well  even  in  very  nasty  environments. 
Even  in  the  single  nastiest  environment,  where  over  60  percent  of  the  random  weight  is  allocated  to 
the  nasty  representatives,  four  COALENC  entries  finish  in  the  top  ten,  and  only  one  nasty  strategy 
finishes  in  the  top  25 

One  surprise  that  emerged  out  of  the  simulations  is  entry  #11.  This  strategy  is  based  on  a  very 
unusual  notion:  it  identifies  the  second-best  player  in  each  game  (in  terms  of  cumulative  payoffs)  and 
mimics  that  player's  previous  action    This  rule  adapts  very  well  to  extreme  environments  (good  or 
bad)  since  it  goes  along  with  coalitions  in  a  most  magnanimous  way  (good  in  nice  environments)  but 
never  initiates  coalition  behavior  (good  in  nasty  environments).  If  we  look  at  alternative  measures  of 
performance,  such  as  number  of  first-place  finishes  in  the  200  simulations,  then  #11  appears  to  be 
even  stronger.  It  is  the  winner  in  50  of  the  environments,  more  than  any  other  MITCS2  entry.'' 

SUMMARY 

This  paper  has  examined  the  role  of  implicit  coalitions  in  a  generalized  prisoner's  dilemma.  We 
find  the  GPD  interesting  because  it  extends  the  classical  PD  to  more  realistic  situations  of  more  than 


Although  entry  #  1 1  is  most  adept  at  winning,  it  does  have  its  bad  moments    It  finishes  out  of  the  top  ten  40.5%  of  the 
time,  including  a  low  of  29"-  place  in  one  environment    CEAVC3.  for  comparison,  is  far  more  robust  with  only  15.5%  of  its 
rankings  below  the  top  ten,  never  lower  than  16-'  place. 
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two  (but  not  many)  players  and  it  gives  players  the  option  of  choosing  actions  from  a  continuous  set. 

When  we  extend  the  PD  to  the  GPD  we  find  the  possibility  of  implicit  coalitions,  that  is,  coalitions 
of  cooperating  players  in  an  otherwise  unfriendly  world.  We  also  expect,  intuitively,  that  strategies 
that  use  the  continuous  nature  of  the  action  space  will  do  better  than  those  that  do  not. 

We  tested  our  conjectures  in  two  three-player  GPD's  as  described  by  equations  (1)  and  (2).  Our 
methodology  was  that  of  computer  tournaments. 

In  both  tournaments,  implicit  coalitions  proved  to  be  the  key  feature  that  distinguished  the  most 
successful  strategies  (in  terms  of  average  score).  In  MITCSl  a  simple  discrete  COALITION  strategy- 
won  and  other  coalition  strategies  fared  well.  However,  a  specific  coalition-encouraging  strategy, 
COALENC,  would  have  won  had  it  been  entered.  In  MITCS2,  several  different  variants  of 
COALEN'C  did  surprisingly  well,  especially  considering  the  differences  in  the  payoff  function  from 
MITCSl  to  MITCS2    Among  the  COALEN'C  entries,  magnanimity  seemed  to  distinguish  the  very 
best  algorithms. 

Finally,  COALENC  strategies  held  up  very  well  in  a  variety  of  hypothetical  environments, 
although  at  least  one  alternative  algorithm  did  well  in  the  nastiest  of  environments. 

Despite  the  fine  showings  of  CEAVG3  and  the  other  COALENC  entries,  we  dare  not  make  our 
claims  too  strong.  The  GPD  is  a  rich  and  complex  problem  and  our  tournaments  only  begin  to  tap  its 
complexity.  Nonetheless,  we  do  feel  confident  that  implicit  coalitions  are  important  and  should  be 
considered  in  any  situation  modeled  by  a  GPD. 

As  in  all  research,  interesting  questions  remain.  Beyond  the  obvious  questions  of  more  than 
three  players,  alternative  payoff  functions,  and  still  more  complex  algorithms,  we  feel  that  further 
investigation  of  magnanimity  and  further  exploitation  of  continuous  action  are  warranted. 
Algorithms  that  have  greater  adaptibility  to  recognize  competitors  are  likely  to  prove  interesting. 

Beyond  computer  tournaments,  there  are  possibilities  for  GPD  experiments  on  human  subjects 
and  descriptive  research  to  determine  which  real-world  conflict  situations  are  best  modeled  by  GPD's 
and  implicit  coalitions.  Finally,  we  view  implicit  coalitions  as  an  excellent  concept  to  examine  the 
overlap  (or  differences)  in  the  approaches  used  by  cooperative  and  non-cooperative  game  theory  to 
study  muliple-player  conflict  situations. 
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